Using Unknown Word Techniques to Learn Known Words
نویسندگان
چکیده
Unknown words are a hindrance to the performance of hand-crafted computational grammars of natural language. However, words with incomplete and incorrect lexical entries pose an even bigger problem because they can be the cause of a parsing failure despite being listed in the lexicon of the grammar. Such lexical entries are hard to detect and even harder to correct. We employ an error miner to pinpoint words with problematic lexical entries. An automated lexical acquisition technique is then used to learn new entries for those words which allows the grammar to parse previously uncovered sentences successfully. We test our method on a large-scale grammar of Dutch and a set of sentences for which this grammar fails to produce a parse. The application of the method enables the grammar to cover 83.76% of those sentences with an accuracy of 86.15%.
منابع مشابه
earning evin night
We develop techniques for learning the meanings of unknown words in context. Working within a com-positional semantics framework, we write down equations in which a sentence's meaning is some combination function of the meaning of its words. When one of the words is unknown, we ask for a paraphrase of the sentence. We then compute the meaning of the unknown word by inverting parts of the semant...
متن کاملA Probabilistic Model for Semantic Word Vectors
Vector representations of words capture relationships in words’ functions and meanings. Many existing techniques for inducing such representations from data use a pipeline of hand-coded processing techniques. Neural language models offer principled techniques to learn word vectors using a probabilistic modeling approach. However, learning word vectors via language modeling produces representati...
متن کاملCo-learning of Word Representations and Morpheme Representations
The techniques of using neural networks to learn distributed word representations (i.e., word embeddings) have been used to solve a variety of natural language processing tasks. The recently proposed methods, such as CBOW and Skip-gram, have demonstrated their effectiveness in learning word embeddings based on context information such that the obtained word embeddings can capture both semantic ...
متن کاملThe effect of teaching the etymology of words to learn and reinforce vocabulary by Iranian children
This study was an attempt to investigate the effect of teaching morphemes on young EFLlearners. To this end, 23 young EFL learners in two different classes who studied at intermediatelevel were selected to participate in the present study. The participants were divided into twogroups. One of the groups was treated as experimental and the other one was considered as thecontrol group. Furthermore...
متن کاملSubword-based Automatic Lexicon Learning for ASR
We present a framework for learning a pronunciation lexicon for an Automatic Speech Recognition (ASR) system from multiple utterances of the same training words, where the lexical identities of the words are unknown. Instead of only trying to learn pronunciations for known words we go one step further and try to learn both spelling and pronunciation in a joint optimization. Decoding based on li...
متن کامل